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Abstract 

The factors limiting the habitat range of species are crucial in understanding their biodiversity and response to environmental 
change. Yet the genetic and genomic architectures that produce genetic variation to enable environmental adaptation have 
remained poorly understood. Here we show that the proportion of duplicated genes (Pp) in the whole genomes of fully 
sequenced Drosophila species is significantly correlated with environmental variability within the habitats measured by the 
climatic envelope and habitat diversity. Furthermore, species with a low Pd tend to lose the duplicated genes owing to their faster 
evolution. These results indicate that the rapid relaxation of functional constraints on duplicated genes resulted in a low Pq for 
species with lower habitat diversity, and suggest that the maintenance of duplicated genes gives organisms an ecological 
advantage during evolution. We therefore propose that the Pq in a genome is related to adaptation to environmental variation. 
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Introduction 

The factors that constrain the evolution of habitat range is of 
critical importance for understanding the evolution of biodi- 
versity and conservation because these factors are closely re- 
lated to historical processes creating the current biodiversity 
and to adaptation to current and future global climate 
changes (Root et al. 2003; Bridle and Vines 2007; Roy et al. 
2009). Even within closely related groups like Drosophila, 
some species have narrow restricted ranges and inhabit one 
or a few habitat types, whereas others have wider ranges and 
live in diverse environments (Kirkpatrick and Barton 1997). 
Kellermann et al. (2009) showed that low genetic variation in 
cold and desiccation tolerance limits the distributions of spe- 
cies. This indicates that a lack of genetic variation in key traits 
within a species is related to failure to expand in range and 
adapt to environmental change. However, what determines 
the ability of species to generate genetic variation (i.e., their 
evolvability) remains unknown. Restricted-range species 
might have genetic and genomic architectures that do not 
allow high variation. 

We focused on gene duplication as a source of genome- 
wide genetic variation. One system that produces and main- 
tains a large amount of genetic variation is buffering. By 
buffering the deleterious consequences of mutations, genetic 
variation can accumulate in a genome. An obvious mecha- 
nism for buffering genetic variation is redundancy, and one of 
the main factors that generates redundancy is gene duplica- 
tion (Wilkins 1997; Hartman et al. 2001). This type of muta- 
tion is particularly common in eukaryotes, and in yeast, 
duplication rates are reportedly faster than point mutation 
rates (Lynch et al. 2008). After gene duplication, one of the 
pair is redundant, and as such, the functional constraints are 



relaxed, and one or both copies can differentiate as long as 
their original function is maintained (Ohno 1970). Therefore, 
under these relaxed functional constraints, mutations are 
likely to accumulate in duplicated genes. Furthermore, the 
functional redundancy of duplicated genes can be maintained 
for extensive periods of time (Dean et al. 2008). Genetic var- 
iations in the duplicated genes within a population are likely 
to be maintained by their buffering effect (Wilkins 1997; 
Hartman et al. 2001 ). Therefore, gene duplication is a major 
source of genetic variation. In fact, Kliebenstein showed that 
not only younger tandem duplicated genes but also older 
duplicated genes elevated intraspecific gene expression vari- 
ation in a population (Kliebenstein 2008). 

Previous studies reported that duplicated olfactory recep- 
tor (Or) and gustatory receptor (Cr) genes were likely to be 
lost in specialist Drosophila species with host specificity; how- 
ever, these studies focused only on the particular gene families 
associated with odor response, and did not consider the re- 
lationship between the duplicated gene content and habitat 
(McBride 2007; McBride et al. 2007). The habitats of special- 
ists are necessarily restricted by those of their hosts; hence, 
host specificity would be expected to be related to habitat 
(Markow and O'Grady 2007). If species with a larger propor- 
tion of duplicated genes (Pq) have a greater potential to 
generate genetic variation for more traits, they might show 
increased environmental adaptability. Namely, duplicated 
genes would have contributed to adaptation to diverse envi- 
ronments within the ranges of species. 

We propose that species with a higher duplicated gene 
content would have distribution ranges with higher environ- 
mental variability. We tested this hypothesis using Drosophila 
species that had been fully sequenced (Clark et al. 2007) and 
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Fig. 1. Habitat distributions of Drosophila species. Habitat distributions of D. yakuba (pink), D. erecta (red), D. ananassae (purple), D. pseudoobscura 
(light green), D. persimilis (green), D. willistoni (orange), D. mojavensis (yellow), and D. wirilis (blue) are shown. Red arrows indicate islands inhabited by 
island endemic species (D. sechellia and D. gr/ms/iau//). The habitat distribution of the cosmopolitan species D. melanogaster is shown in figure S1, 
Supplementary Material online. 



had documented habitat ranges (figs. 1 and SI, 
Supplementary Material online) (Markow and O'Grady 
2006). Environmental variability within their habitat range 
(here referred to as habitat variability) was measured using 
two different indices, one relating to climatic envelope and 
the other to habitat diversity. We estimated their climatic 
envelopes using bioclimatic variables from WORLDCLIM 
(Hijmans et al. 2005). Variability of Koppen climate classifica- 
tion (Kottek et al. 2006) within their range was estimated as 
habitat diversity using the Brillouin's index as a measure of 
species diversity (Margalef 1958; Legendre and Legendre 
1998). These indices were used as measurements to indicate 
the adaptability of species to environmental variability within 
their habitats. Examining the effect of duplicated genes on the 
expansion, contraction, and/or conservation of the habitat 
ranges of these species during their evolution allowed us to 
explore the importance of basal genetic diversity for 
adaptation. 

Materials and Methods 

Fully Sequenced Drosophila Species 
The genomes of 12 Drosophilo species (Drosophila melanoga- 
ster, D. sechellia, D. simulans, D. yakuba, D. erecta, D. ananas- 
sae, D. pseudoobscura, D. persimilis, D. willistoni, D. mojavensis, 
D. virilis, and D. grimshawi) have been fully sequenced (Clark 
et al. 2007). However, the coverage of the genome assemblies 
for D. simulans is comparatively poor. As a result, the number 
of identified orthologs of D. melanogaster in the D. simulans 
genome is relatively low (Heger and Ponting 2007), even 
though these are among the closest related Drosophila spe- 
cies. We therefore excluded D. simulans from our analyses. 

Drosophila Gene Sequences 

The protein sequences corresponding to protein-encoding 
genes from the 11 Drosophila species were downloaded 
from the EnsemblMetazoa database, release 4 (http://meta 
zoa.ensembl.org). In some cases, a x\ox\-melanogaster %ex\e was 
split into two genes as a result of sequence or assembly errors 



(fig. S2A, Supplementary Material online). To minimize these 
errors, we conducted a homology search using the Basic Local 
Alignment Search Tool (BLAST; D. melanogaster protein se- 
quences vs. non-melanogaster protein sequences), combined 
the sequences of physically neighboring genes in a non-mel- 
anogaster genome into one sequence when the neighboring 
genes did not show homology to each other (E value < 10~^), 
and identified the best hit for the same gene among those in 
D. melanogaster (table SI, Supplementary Material online). 
The merged genes were treated as a single gene in this 
study. For non-melanogaster species, we also combined the 
nucleotide sequences of separate genes by the same process. 

Duplicated Genes 

To identify duplicated genes in the Drosophila genomes, we 
conducted an all-to-all BLAST search for all protein sequences 
used in this study. Genes with a homologue (E value < 10~^ 
and query coverage >30%) in the same species were identi- 
fied as candidate duplicated genes. Importantly, we found 
that our results were robust at different E value cut-offs 
(10~^ 10~^°, and 10"^°), and the trend in our results did 
not change as a result of different cut-off values. 

Synonymous Substitution Rate between Duplicated 
Genes 

To examine the distribution of gene duplication timing for 
the duplicated genes, we aligned the sequences of the dupli- 
cated gene pairs derived from the EnsemblMetazoa database 
by using the T-COFFEE multiple sequence alignment program 
(Notredame et al. 2000) and estimated the synonymous sub- 
stitution rate (Ks) between a duplicated gene and its closest 
paralogue by the Yang and Nielsen method (Yang and 
Nielsen 2000), which was implemented in the Phylogenetic 
Analysis by Maximum Likelihood (PAML) program package 
(Yang 1997). Note that the closest paralogues were deter- 
mined by identifying the best BLAST hit from the duplicated 
gene candidates. The distributions of Ks are shown in figure 
S3, Supplementary Material online. 
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Gene Collapse Based on Sequence Similarity 
There are nnany non-divergent duplicated gene pairs created 
as a result of recent duplication events or assembly errors in a 
genome. Homologous gene pairs with Ks < 0.1 were collapsed 
into a single gene (a2-a3-a4 and b1-b2 in fig. SIB, 
Supplementary Material online). The collapsed genes were 
classified as duplicated genes on the basis of a BLAST hit in 
comparison with genes not recently duplicated. If at least one 
gene in the homologous gene cluster (a2, a3, and a4 in fig. S2B, 
Supplementary Material online) had a duplicated gene part- 
ner (a1 in fig. S2B, Supplementary Material online; Ks > 0.1; E 
value < 10~^ and query coverage > 30%), the collapsed gene 
was defined as a duplicated gene. If not, the collapsed gene 
(b1-b2 in fig. S2B, Supplementary Material online) was de- 
fined as a singleton. The duplicated genes in Dmsophila are 
summarized in table S2, Supplementary Material online. 

Lineage-Specific Gene Losses 

Orthologs were defined by reciprocal best hits between dif- 
ferent species by using the results of the all-to-all BLAST. If the 
orthologous relationship was obtained by one-to-one best 
hits, we defined the orthologs as one-to-one. Such a relation- 
ship indicated that there has been no lineage-specific gene 
duplication and loss after speciation. We identified ortholo- 
gous gene clusters using one-to-one orthologous relationships 
for closely related species and their outgroups to investigate 
gene-loss events during evolution (fig. S4A, Supplementary 
Material online). We did not use genes without orthologs 
in the outgroups, because it was not easy to predict their 
ancestral state. If there was no gene-loss event in either of 
the closely related species, we obtained orthologous trios 
where possible (e.g., species IB-species 2B-outgroup B in 
fig. S4B, Supplementary Material online). When orthologous 
trios were not available, we inferred that gene-loss events 
occurred in either lineage of the closely related species (fig. 
S4A, Supplementary Material online). For the comparison, we 
typically used D. melanogaster as the outgroup. When we 
investigated gene-loss events for species that were in the 
clade including D. melanogaster, we used other closely related 
species as the outgroups (tables 1 and 2). Note that there 
were other possible outgroups for the comparisons of some 
species; however, even when we used other outgroups for 
estimating the proportion of lost duplicated genes, our 
result did not change. To identify a species' lost duplicated 
genes generated before speciation from another species, we 
focused on the gene similarity between a species and the 
outgroup species (fig. S4B, Supplementary Material online). 
We defined a species (e.g., species 1 in fig. S4B, Supplementary 
Material online) as having a lost duplicated gene (species 1A 
in fig. S4B, Supplementary Material online) when the follow- 
ing were observed: an inferred ortholog (species 2A in fig. S4B, 
Supplementary Material online) in the compared species 
(species 2 in fig. S4B, Supplementary Material online) was a 
duplicated gene, and the similarity between the duplicated 
gene and its duplicated gene partner (similarity between spe- 
cies 2A and species 2B in fig. S4B, Supplementary Material 
online) was lower than that between either of the duplicated 
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copies and its best hit homolog in the outgroup (similarity 
between species 2A and outgroup A or that between species 
2B and outgroup B in fig. S4B, Supplementary Material 
online), as determined by a BLAST search. 

Relative Rate Test for Orthologous Gene Pairs 
To detect fast-evolving genes after the speciation of closely 
related species (fig. S4C Supplementary Material online), we 
conducted a relative rate test using protein sequences aligned 
by T'COFFEE for the following closely related species and 
their outgroups: D. melanogaster, D. sechellia, D. yakuba (out- 
group); D. pseudoobscura, D. persimilis, D. melanogaster (out- 
group); and D. yakuba, D. erecta, D. melanogaster (outgroup) 
(Tajima 1993). We used the orthologous trios of closely re- 
lated species and their outgroups obtained earlier, and 
counted the number of significant fast-evolving genes for 
each species (table 1). 

Divergence of Duplicated Gene Pairs in Orthologous 
Trios 

We examined whether an acceleration of the evolutionary 
rates of duplicated genes occurred in species with low habitat 
variability and low Pq. We focused on all the duplicated gene 
pairs in the aforementioned dataset of orthologous trios, to 
minimize any effect of lineage-specific extra gene duplications 
on evolutionary rates (fig. S4D, Supplementary Material 
online). Note that as long as a relationship among the three 
species is observed, no gene loss events have occurred in any 
of the lineages of the orthologous trios. An extra gene copy 
generated by lineage-specific gene duplication might cause a 
relaxation of the functional constraints on the gene copies in 
the lineage; therefore, we used duplicated gene pairs derived 
from a duplication event before the speciation of the closely 
related species and their outgroups. Note that no recent gene 
duplication events occurred after speciation in the datasets 
from these trios. We counted the number of duplicated gene 
pairs in which at least one partner was a significantly 
fast-evolving gene for each species (table 1 ). 

Habitat Area and Habitat Variability 
The habitat areas for the Drosophila species were obtained 
from the literature (Ashburner et al. 1982; Piano et al. 1997; 
Reed and Markow 2004; Markow and O'Grady 2006) (online: 
http://scitechlab.wordpress.com/2008/11/02/the-humble- 
fruit-fly-drosophila-melanogaster). Habitat variability was 
estimated from climatic envelope and habitat diversity 
using the Koppen climate classification. Climatic envelope is 
the range of temperatures, rainfall, and other climate-related 
parameters in which a species currently exists. We estimated 
climatic envelope using principal component analysis (PCA) 
with WORLDCLIM (Hijmans et al. 2005). We obtained world 
spacial data and the WORLDCLIM climatic dataset (10 min- 
utes latitude/longitude) from DIVA-CIS (http://www.diva-gis 
.org). The habitat area was measured as the number of grid 
squares on the climate map. We then extracted the climatic 
values from 19 bioclimatic variables used for BIOCLIM 
(Hijmans et al. 2005) in the habitat area of each Drosophila 



species. We performed PCA using the bioclimatic variables for 
all of the species, and found that the first 2 principal compo- 
nents (PCs) explained 93.4% of the total variance (table S3, 
Supplementary Material online). The contribution of PCI and 
PC2 is 79.9% and 13.5%, respectively. PCA plots (x-axis: PCI 
andy-axis: PC2) and the correlation circle are shown in figure 
S5 and S6, Supplementary Material online, respectively. On 
the basis of PCA results, we also plotted values of PCI and 
PC2 for each species (fig. S5, Supplementary Material online). 
We used 107,865 cells (PCI: 799 x PC2: 135) by weighting the 
relative contribution to PCI and PC2 for estimating climatic 
envelope, and defined the number of cell grids overlapping 
points in the 107,865 cell grids as the climatic envelope of 
Drosophila species. 

The Koppen climate classification map was used for esti- 
mating the Drosophila species' habitat diversity (Kottek et al. 
2006). This climate map consists of a grid of squares (0.5° 
latitude/longitude) in which a certain climate is classified by 
temperature, precipitation, and vegetation (fig. SI, 
Supplementary Material online). The habitat area was mea- 
sured as the number of grid squares on the climate map. The 
number of grids varied among the Drosophila species, and 
therefore habitat diversity was calculated using varieties 
(through logarithmic transformation) in the climatic environ- 
ment among grid squares for each species using the Brillouin's 
index, which is robust to sample size (Margalef 1958). 

Model Selection 

All of the following statistical analyses were executed in R 
(http://www.r-project.org). We applied model selection 
using regression to examine which genomic factors affect 
habitat features (bioclimatic variables and habitat area). We 
explored the set of predictors of the explanatory variables 
using the stepwise Akaike's Information Criterion procedure, 
and determined the set of variables that yielded the lowest 
score. In addition, we conducted a multivariate analysis of 
variance (MANOVA) in which two variables (climatic enve- 
lope and habitat area) were used as response variables, and 
genome size, number of genes and Pp were used as explan- 
atory variables (tables S4 and S5, Supplementary Material 
online). 

To remove any phylogenetic constraints on the relation- 
ship between genetic architecture and habitat, we used a 
robust phylogeny derived from Drosophila 12 Genomes 
Consortium (Clark et al. 2007). Using this phylogenetic tree, 
we selected a model by applying the generalized least squares 
model with the Brown ian model, as described earlier, and 
measured the phylogenetically independent contrasts (PICs) 
(Felsenstein 1985). We performed linear regression analyses 
for selected explanatory variables using the estimated PICs. 

Gene Ontology 

To investigate whether the lineage-specific lost or fast- 
evolving duplicated genes of species with low Pq were 
enriched in some particular functional categories, we exam- 
ined the Gene Ontology (GO) database entries for the 
duplicated genes between species with different Pp 
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(D. melanogaster-D. sechellia and D. pseudoobscum- 
D. persimilis) The GO identifiers (ids) and GO "slim" anno- 
tations for the biological processes of D. melanogaster 
were downloaded from ftp://ftp.geneontology.org/pub/go/ 
gene-associations/ and ftp: //ftp.geneontology. org /pub /go/ 
GO_slims, respectively. We excluded those classified as 
GO:0008150 (biological process unknown). The frequency 
of each GO id was counted for the D. melanogaster genes. 
For the other species, we used the GO ids of the most similar 
homolog in D. melanogaster. To analyze the GO data for 
genes that had been lost in the D. melanogaster lineage 
(mell in fig. S2C Supplementary Material online), we used 
the GO id of the most similar homolog retained in 
D. melanogaster (mel2 in fig. S2C, Supplementary Material 
online) for the orthologs retained in the D. sechellia 
genome (seel in fig. S2C, Supplementary Material online) of 
the lost genes. The enrichment of GO ids for the genes in 
species having a low was compared with that in species 
having a high Pq. We calculated the P value for each GO id by 
comparing two different gene sets. The estimated P values 
were adjusted using Bonferroni correction. 

Results and Discussion 

Recently Duplicated Genes 

We identified duplicated genes by similarity search (blastp) 
for each species and estimated the synonymous substitution 
rate (Ks) between a duplicated gene and its closest paralogue 
of 11 fully sequenced Drosophila species. We observed that 
the duplicated gene pairs tended to have Ks < 0.1 (fig. S3, 
Supplementary Material online). This observation is consis- 
tent with high rates of gene duplications and losses (Lynch 
and Conery 2000). There was apparent bias in the number of 
recent duplication events in particular species. Although the 
recent burst of gene duplications observed in some particular 
lineages is biologically feasible, we question the reliability of 
the enrichment of recently duplicated gene pairs. Indeed, it is 
difficult to distinguish recently duplicated genes from artifacts 
of genome assembly. On the other hand, diverged duplicated 
genes maintained in a genome are obviously derived from 
ancient gene duplication events (not artifacts), and most of 
the substitutions can be attributed to diverged duplicated 
genes (not recently duplicated genes). Therefore, we focused 
on diverged duplicated genes in the following analyses, and 
homologous gene pairs with a /<s < 0.1 were collapsed into a 
single gene (see Materials and Methods). We found that there 
was a significant positive correlation between the proportion 
of total duplicated genes including recently duplicated genes 
and climatic envelope estimated by bioclimatic variables 
(R^ = 0.62, P = 0.0066; see next section in more detail), but 
there was no significant correlation between the proportion 
of only recently duplicated genes and climatic envelope 
(R^ = 0.26, P = 0.13). This observation indicates that in com- 
parison with evolutionarily maintained duplicates, recent du- 
plicates are unlikely to contribute to the climatic envelope. 
This could be attributed to the lower divergence of recently 
duplicated genes. 



Pd Associated with Habitat Diversity 
To investigate the relationship between genomic architecture 
and habitat, we employed a linear model in which genome 
size (Nardon et al. 2005) from (Bosco et al. 2007), the Pp and 
number of genes were used as explanatory variables, and cli- 
matic envelope and habitat area were used as response var- 
iables, removing the phylogenetic constraints. Pq was selected 
as the sole explanatory variable for climatic envelope 
(R^ = 0.82, P = 0.00032; fig. 2). Pd and number of genes were 
selected as explanatory variables for habitat area, but the re- 
gression coefficient was statistically significant only for the 
former (R^ = 0.45, P = 0.024); these results were not changed 
by using a MANOVA for the two response variables (table S4, 
Supplementary Material online). We then examined the ef- 
fects of climatic envelope and habitat area on Pq, and only 
climatic envelope was selected as an explanatory variable. 
These results indicate that Pp is strongly correlated with cli- 
matic envelope. 

We were concerned that two extreme contrasts might be 
driving the relationship in figure 2B. The two extreme con- 
trasts were generated by the large climatic envelope values for 
D. melanogaster and D. pseudoobscura, which have closely 
related species with opposite features, that is, D. sechellia 
and D. persimilis, respectively. However, even when we re- 
moved D. melanogaster and D. pseudoobscura from our anal- 
yses, we observed the same trends (R^ = 0.58, P = 0.029; fig. 
S7A, Supplementary Material online). 

Some species' habitats are known to have been ex- 
panded by human activity, particularly in the case of D. mel- 
anogaster, which has been spread around the world. We 
therefore repeated the analysis without D. melanogaster, 
and confirmed that the results were not affected (table S4, 
Supplementary Material online). It has been reported that 
D. I//V/7/S is a holarctic species (Ash burner et al. 1982; Mirol 
et al. 2008). When D. i//>/7/s and/or D. melanogaster were re- 
moved from the analysis, the results did not change (data not 
shown). 

We suspected that the differences in genome coverage 
among Drosophila species might correlate with Pp, and there- 
fore examined this relationship. We obtained data on the 
genome coverage of all Drosophila species except for D. mel- 
anogaster from EnsemblMetazoa, and found no correlation 
between the genome coverage and the Pq (R^ = 0.014, 
P = 0.76). 

To reinforce our results, we investigated the relation- 
ship between Pq and the habitat diversity of Drosophila 
within their range based on Koppen climate classification. 
The classification considers not only temperature and pre- 
cipitation but also vegetation (Kottek et al. 2006). 
Environmental diversity within the habitat was estimated 
using the Brillouin's index (Margalef 1958). Similar results 
were obtained even when we used a different measure 
of environmental variability with a different climatic dataset 
(R^ = 0.93, P = 7.7 X 10~^; fig. S8 and table S5, Supplementary 
Material online). These results strongly support the conten- 
tion that in Drosophila species, Pd is correlated with habitat 
variability. 
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Fig. 2. Correlation between climatic envelope and Pq. (A) Relationship 
between climatic envelope and Pp. The x-axis indicates Pq for 11 
Drosophila species (mel, D. melanogaster, sec, D. sechellia; yak, 
D. yakuba; ere, D. erecta; ana, D. ammssae; pse, D. pseudoobscura; 
per, D. persimilis; wil, D. willistoni; moj, D. mojavensis; vir, D. i/;r/7;s; and 
gri, D. gr/msf?c?M//). The y-axis indicates climatic envelope estimated by 
WORLDCLIM datasets. (B) Relationship between contrasts in climatic 
envelope and Pq. Thex-axis indicates PICs in Pq, and the y-axis indicates 
PICs in the climatic envelope. The dashed line represents the regression 
line. 



The Influence of Efi^ective Population Size on 
Genomic Architecture 

It has been proposed that there are correlations between 
effective population size and genomic contents (Lynch and 
Conery 2003). In the genus Drosophila, several studies have 
shown that the evolutionary rates of genes are faster in the 
host-specific species D. sechellia, which has a small effective 
population size, than in the cosmopolitan species D. simulans 
(Kliman et al. 2000; McBride 2007). Similarly, Singh et al. 
(2009) observed that the evolutionary rates of genes are 
likely to be accelerated in the host-specific species D. sechellia 
and D. erecta, which have smaller effective population sizes, 
compared with D. melanogaster and D. yakuba. These studies 
tend to suggest that the genes of species with small popula- 
tion sizes evolve fast, possibly due to less effective natural 



selection. Petit and Barbadilla (2009) examined the effective 
population sizes of many of the Drosophila species used in the 
present study and reported that selection efficiency is corre- 
lated with effective population size, which, in turn, is corre- 
lated with levels of genomic codon bias, proportion of 
adaptive substitutions, and repetitive sequences. We there- 
fore examined the relationship between effective population 
size and climatic envelope using synonymous polymorphism 
in the genes of the seven Drosophila species (D. melanogaster, 
D. sechellia, D. yakuba, D. erecta, D. mojawnisis, and D. virilis) 
reported in Petit and Barbadilla (2009). Our results showed 
that there was no significant correlation between effective 
population size and climatic envelope (R^ = 0.061, P = 0.63; 
fig. S9A, Supplementary Material online). In addition, the re- 
sults of a linear model, in which effective population size and 
climatic envelope were used as explanatory variables, showed 
that Pd was explained by climatic envelope (R^ = 0.85, 
P = 0.0089) but not by the effective population size. Further, 
we examined the relationship between Pp and climatic enve- 
lope after removing species with a small effective population 
size (D. sechellia and D. erecta), because the correlation be- 
tween selection efficiency and population size was strong 
when the host-specific species were compared with general- 
ist species (Petit and Barbadilla 2009; Singh et al. 2009). 
However, we still found a significant correlation between 
Pd and climatic envelope (R^ = 0.86, P = 0.00086; fig. S9B, 
Supplementary Material online), suggesting that differences 
in Pd among species are not explained by the effective 
population sizes. 

Evolutionary Processes in Divergence of Duplicate 
Content 

We next investigated the evolutionary processes responsible 
for differences in the Pd between closely related species with 
different habitat variability. First, we examined the conserva- 
tion of Pd by fitting it to the phylogenetic tree using a 
Brownian motion model and calculating Pagel's lambda 
(Pagel 1999). We found that lambda was 1.2x10"^, 
and that the value differed significantly from 1 under 
Brownian motion evolution by comparison of likelihoods 
(P = 3.3 X 10~^). This indicates that the phylogeny does not 
explain the distribution of Pd among the Drosophila species. 
We therefore examined whether the loss of duplicated genes 
occurred more frequently in species with low Pd- We focused 
on two species pairs, D. melanogaster-D. sechellia and 
D. pseudoobscura-D. persimilis, in which the habitat variabil- 
ity and Pd differed even though they were closely related 
phylogenetically (figs. 2A and S8A, Supplementary Material 
online) and found that species with low habitat variability and 
low Pd (D. sechellia and D. persimilis) tended to lose dupli- 
cated genes (figs. S4A and S4B, Supplementary Material 
online, and table 1). D. pseudoobscura and D. persimilis di- 
verged more recently than did D. melanogaster and D. sechel- 
lia, and, in fact, the former species pair can easily interbreed. 
Even though divergence times were different between species 
pairs, we observed consistent results in which species with 
low habitat variability tended to lose duplicated genes, when 
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Fig. 3. Correlation between climatic envelope (or Pq) and the Pq of lost genes, (A) Relationship between climatic envelope and the Pd of lost genes. The 
X-axis indicates the proportion of lost duplicated genes for 1 1 Drosophila species (mel, D. melanogaster; sec, D. sechellia; yak, D. yakuba; ere, D. erecta; ana, 
D. ananassae; pse, D. pseudoobscura; per, D. persimilis; wil, D. willistoni; moj, D. mojawensis; vir, D. virilis; and gri, D. grimshawi). They-axis indicates climatic 
envelope estimated by WORLDCLIM datasets. Error bars indicate standard error for the Pq of lost genes derived from table 2. (B) Relationship between 
contrasts in climatic envelope and the average proportion of lost duplicated genes for each species. The x-axis indicates PICs in the Pq of lost genes, and 
the y-axis indicates PICs in the climatic envelope. The dashed line represents the regression line. (C) The Pq of total genes for each species (black) and 
the average proportion of lost duplicated genes for each branch (gray) on phylogenetic tree. The phylogenetic tree is from Drosophila 12 
Genomes Consortium (2007). (D) Relationship between contrasts in the Pq of total genes and the average proportion of lost duplicated genes for 
each species. The x-axis indicates PICs in the Pq of lost genes, and the y-axis indicates PICs in the Pq of total genes. The dashed line represents the 
regression line. 



we expanded the estimation for all species used in our study 
(table 2). In addition, we found that there was a strong neg- 
ative correlation between the loss rates of duplicated genes 
and climactic envelope among the species, even after phylo- 
genetic constraints were removed (R^ = 0.92, P=1.1 x 10~^; 
fig. 3). Furthermore, we found a significant negative correla- 
tion between and the loss rates of duplicated genes 
(R^ = 0.73, P = 0.001 7; fig. 3C and D). Note that the extreme 
contrasts in figure 3B and D did not drive the relationships 
(fig. S7B, Supplementary Material online; = 0.78, P = 0.0038 
and Fig. S7C; R^ = 0.45, P = 0.068). The negative correlation in 
figure S7C, Supplementary Material online, after removal of 
the extreme contrasts was not statistically significant, but it is 
possible due to the low statistical power of the small dataset. 
This indicates that the functional constraints on duplicated 
genes of species with low habitat variability are more relaxed 



than those of species with high habitat variability. D. sechellia 
is thought to have lost Or and Cr genes associated with odor 
response in compensation for specializing on Morinda citrifo- 
lia, which is toxic to other Drosophila (McBride 2007; McBride 
et al. 2007). This is likely to be a case of antagonistic pleiot- 
ropy (trade-offs) (Hoffmann 2010). McBride reported that 
not only Or/Gr genes but also randomly chosen genes in 
D. sechellia were fast-evolving compared with those in the 
closely related cosmopolitan species D. simulans (McBride 
2007). Our findings derived from genome-wide analyses sug- 
gest that DNA decay occurred in climatic specialists rather 
than generalists (Hoffmann and Willi 2008; Hoffmann 2010), 
although it is difficult to distinguish the hypothesis from that 
of antagonistic peliotropy (Hoffmann 2010). However, we 
suggest that species with low habitat variability might have 
lost the functional constraints on genes in general. We 
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conducted a relative rate test (Tajima 1993) to detect 
lineage-specific fast-evolving genes to understand the general 
trends in differences in the functional constrains on genes 
between closely related species with different habitat 
ranges. The number of fast-evolving genes in species with 
high habitat variability and high Pq (D. melanogaster and D. 
pseudoobscura) was significantly smaller than that in species 
with low habitat variability and low Pq (D. sechellia, 
P < 2.2 X 10"^^, test; and D. persimilis, P < 2.2 x 10"^^, 

test; fig. S4C Supplementary Material online, and table 
1). Furthermore, duplicated gene pairs including significantly 
fast-evolving genes were enriched in species with low habitat 
variability and low Pq (D. sechellia, P < 2.2 x 10~^^, x^ test; 
and D. persimilis: P< 2.2x10"^^, x^ test; fig. S4D, 
Supplementary Material online, and table 1). These results 
imply that the low Pd in species with low habitat variability 
was caused by losses both of duplicated genes and of se- 
quence similarity between duplicated gene pairs. 

We examined whether lineage-specific lost and 
fast-evolving genes in species with low Pp were enriched in 
particular functional categories using gene ontology (http:// 
www.geneontology.org). As a result, we detected little enrich- 
ment of functional categories for the genes in species with low 
Pd; the lost genes in D. sechellia and D. persimilis were en- 
riched only in metabolic process (P=1.5x 10~^) and re- 
sponse to stimulus (P = 9.9 x 10""^), respectively. Note that 
the enrichment of lost genes related to metabolic process in 
D. sechellia could be caused by a trade-off associated with 
specializing on the fruits of A/1, citrifolia, which contain sub- 
strates toxic to other Drosophila species (Markow and 
O'Grady 2007). This indicates that both the loss and relaxa- 
tion of functional constraints are common for genes in spe- 
cies with low habitat variability, rather than being specific to 
particular genes. Species would need not only cold and des- 
iccation tolerances but also physiological, morphological, 
behavioral, and certain other adaptations to live in heteroge- 
neous environments. 

We also conducted the analyses using the closely related 
species D. yakuba and D. erecta; although D. yakuba has a 
wider distribution in Africa than the specialist D. erecta 
(Markow and O'Grady 2006), both species inhabit tropical 
regions and have similar P^. We found no significant differ- 
ence in the Pd of lineage-specific lost genes between these 
species, which was consistent with our hypothesis (table 1 ). 
However, the duplicated gene pairs containing significantly 
fast-evolving genes and the fast-evolving genes themselves 
were both enriched in D. erecta (P<2.2x 10~^^, x^ test; 
table 1). Notably, this difference was smaller than that 
for species pairs with different habitat variability and Pd 
(D. melanogaster-D. sechellia and D. pseudoobscura-D. persi- 
milis; table 1). Although both D. yakuba and D. erecta have 
low habitat variability, these results might also be affected by 
the host specificity, narrow habitat area and/or small popu- 
lation size of D. erecta. Overall, our results suggest that in 
species with low habitat variability, duplicated genes have 
been lost from the genome, whereas in species with high 
habitat variability, high Pd has been maintained in the 
genome. 



Cause and Effect of Habitat Variability 
Adaptation to homogeneous environments (e.g., host special- 
ization) is probably the main cause of habitat range restric- 
tion, because our results show that species with low habitat 
variability and low Pd tend to lose duplicated genes (fig. 3, 
tables 1 and 2). In addition, we also showed that there is no 
evidence to suggest that species with high habitat variability 
have gained a greater number of duplicated genes than those 
with low habitat variability. This indicates that habitat vari- 
ability cannot be the cause of increasing Pd. We propose that 
selection for retaining genetic diversity operated efficiently in 
species with high habitat variability. Under this selection, du- 
plicated genes in species with high habitat variability were 
maintained, in contrast to those species with low habitat 
variability. Therefore, the loss of duplicated genes could be 
a reason for restricting habitat expansion to habitats with 
lower variability after species have adapted to homogeneous 
environments and lost the genes. Compared with more gen- 
eralist species, host-specific species (D. sechellia, D. erecta, and 
D. mojai^enisis) and island endemic species (D. sechellia and D. 
grimshawi) are unable to expand their distributions to het- 
erogeneous environments due to a lack of genetic variation 
conferred by retention of duplicated genes. Therefore, Pd can 
be both a cause and an effect of habitat variability in 
Drosophila species (fig. 3, tables 1 and 2). 

Conclusion 

Our findings show that the Pd in a genome strongly correlates 
with the habitat variability of a species. Variable environments 
within a species' range must promote the maintenance of 
duplicated genes. A recent study predicted that duplicated 
genes could be maintained in gene regulatory networks in 
randomly fluctuating environments (Tsuda and Kawata 
2010). The expression of duplicated genes was more diverse 
than that of singletons (Kliebenstein 2008; Ha et al. 2009; 
Dong et al. 2011), and therefore, individuals with more du- 
plicated genes have advantages in diverse environments be- 
cause they produce more genetically variable offspring. 
Kellermann et al. (2009) showed that specialist species 
lacked genetic variation in key traits, thereby limiting their 
ability to adapt to changed conditions. Our results indicate 
that genetic and genomic architecture, such as the Pd in a 
genome, are fundamental constraints on the production of 
genetic variation for adaptation to new and varied 
environments. 

Many of the whole genome sequences in the database 
were determined from inbred individuals. Therefore, these 
sequences do not provide information about the genetic var- 
iation of the population. Although species have gene copy 
number variations in their genomes, it is highly unlikely that 
inbreeding or the founder effect immediately reduces their Pd. 
Therefore, Pd can be estimated even from the genomic se- 
quences of inbred lines as a representative value of an indi- 
vidual of a population. We suggest that Pd is an excellent 
genetic indicator for adaptation to habitat diversity. Whole 
genomes can now be sequenced comparatively easily, and 
techniques continue to rapidly improve (Metzker 2010). 



3177 



Makino and Kawata • doi:10.l093/molbev/mssl33 



MBE 



Further analyses of duplicated genes in additional species will 
clarify the relationship between genetic factors and habitat 
distributions that depend on habitat variability. If the rela- 
tionship between Pq and habitat variability applies to other 
organisms, it allows us to predict which species are unlikely to 
survive to environnnental change, which could aid future 
biodiversity conservation efforts. This study shows the first 
evidence that genome-wide duplicated gene content deter- 
mines ecological traits. Our results provide new insight into 
the evolution of duplicated genes, that their maintenance 
might confer an ecological advantage to an organism 
during evolution. 

Supplementat7 Material 

Supplementary figures S1-S9 and tables S1-S5 are available 
at Molecular Biology and Evolution online (http://www.mbe 
.oxfordjournals.org/). 
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